-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TokenType: Tests and optimization for pointer types #8060
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting additional tests. I have also made a few educational questions that are lower priority, not blocking the merge of the PR, and can be addressed at a later stage.
@@ -1286,4 +1286,26 @@ public void M(object o) | |||
} | |||
} | |||
""", allowSemanticModel); | |||
|
|||
[DataTestMethod] | |||
[DataRow("var d = [u:dateTimePointer]->Date;", false)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Educational: could you please briefly explain when a token is of an unknown type (i.e. TokenType.UnknownTokentype
), and why the method parameter dateTimePointer
is of such type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unknown is the default tokentype and is used for all tokens that are not of any of the other types:
public enum TokenType {
[pbr::OriginalName("UNKNOWN_TOKENTYPE")] UnknownTokentype = 0,
[pbr::OriginalName("TYPE_NAME")] TypeName = 1,
[pbr::OriginalName("NUMERIC_LITERAL")] NumericLiteral = 2,
[pbr::OriginalName("STRING_LITERAL")] StringLiteral = 3,
[pbr::OriginalName("KEYWORD")] Keyword = 4,
[pbr::OriginalName("COMMENT")] Comment = 5,
}
"Unknown" tokens are skipped when the info is written to the disc because there is no difference.
|
||
[DataTestMethod] | ||
[DataRow("var d = [u:dateTimePointer]->Date;", false)] | ||
[DataRow("var d = dateTimePointer->[u:Date];", false)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Educational: I guess Date
is of unknown type because dateTimePointer
is of unknown type. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnknownTokentype
here means the classification of the token isn't any of the other TokenTypes. It doesn't have anything to do with the actual .Net types involved. The enum is badly named. It should be called TokenClassification instead of TokenType.
Also:
For all identifier tokens, there can only be two possible TokenType outcomes: "TypeName" or "UnknownTokentype" *). If we can conclude that the token can never bind to a type, we are done. A pointer can never be a type. (On a normal member access, we can not be sure: Math.Pi
here Math
can bind to a type or a property. Here we need to consult the semantic model for Math
and see what it binds to.
*) One exception is "value" in property setters, which is classified as a keyword.
[DataRow("var d = [u:dateTimePointer]->Date;", false)] | ||
[DataRow("var d = dateTimePointer->[u:Date];", false)] | ||
[DataRow("var d = (*[u:dateTimePointer]).Date;", false)] | ||
[DataRow("var d = (*dateTimePointer).[u:Date];", false)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Educational: [u:dateTimePointer]
and [u:Date]
could be assessed in the same test, correct? Decorating the token dateTimePointer
with the token type assertion syntax doesn't impact the token Date
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I try to assert only single tokens as it eases the test debugging. Also, the allowSemanticModel
value may differ between the tokens. I just prefer the test isolation here (test a single thing).
[DataRow("System.[t:Int32]* iPointer;", false)] | ||
[DataRow("[k:void]* voidPointer;", false)] | ||
[DataRow("DateTime d = default; M(&[u:d]);", false)] | ||
public void IdentifierToken_Unsafe_Pointers(string statement, bool allowSemanticModel) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some additional test cases that come to mind:
int?* x1; // Pointer of a nullable value type ("?" shouldn't be part of the token, right?
Nullable<int>* x2; // Nullable is of type t, whereas int is of type k, both are recognized
int** x3; // ** is not included in the token int, which is of type k
And relative assignments:
_ = &x1; // x1 of type u
_ = (*(&x1))->Value; // x1 and Value of type u
_ = (*x1).Value; // x1 and Value of type u
There are also ways of mixing @
and &
, and we may want to make sure we only highlight the right thing:
var @int = (int*)null; // 2nd int and null of type k, 1st int of type u (?) with @ included
_ = *@int; // @int of type u (?), * not included, @ included
_ = &@int; // @int of type u (?), & not included, @ included
You can also similarly mix []
with *
and &
.
Hereafter is my understanding of one of the cases mentioned above (to be verified):
[DataRow("var [u:@int] = ([k:int]*)[k:null];_ = *[u:@int];", false)]
Would it make sense to cover (some of) them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a couple of tests. The @
tests are a bit unrelated and I separated those. The @
is just parsed as part of the identifier token, so it doesn't really change anything.
@@ -196,6 +196,8 @@ name switch | |||
// that can not bind to a type. The only things that can bind to a type are SimpleNames (Identifier or GenericName) | |||
// or pre-defined types. None of the pre-defined types have a nested type, so we can exclude these as well. | |||
{ Parent: MemberAccessExpressionSyntax x } when AnyMemberAccessLeftIsNotASimpleName(x) => TokenType.UnknownTokentype, | |||
// The left side of a pointer member access must be a pointer and can not be a type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Educational: this assertion still holds when using using unsafe
from C# 12, right?
using System;
using unsafe Ptr = int*;
internal class Program
{
private unsafe static void Main(string[] args)
{
int x = 3;
Ptr y = &x;
Console.WriteLine($"x = {y->ToString()}");
}
}
Ptr
should still be considered TokenType.UnknownTokentype
, and not a type. Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just works. Ptr
is in a type position and is therefore classified as type (There is a distinguishing happening before ClassifyMemberAccess
is called. An IdentfierSymtax
can either be in an expression or a type context. In a type context, a single IdentfierName resolves to TokenType.TypeName and in an expression context to UnknownTokentype).
724ce73
to
f1eb00e
Compare
f1eb00e
to
d5f8409
Compare
Kudos, SonarCloud Quality Gate passed! |
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot for your explanations, they made the code much clearer to me!
@martin-strecker-sonarsource Converted to draft temporarily, to avoid merging by mistake, while the CI runs. |
Follow up to #7369